Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14482][SQL] Change default Parquet codec from gzip to snappy #12256

Closed
wants to merge 1 commit into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Apr 8, 2016

What changes were proposed in this pull request?

Based on our tests, gzip decompression is very slow (< 100MB/s), making queries decompression bound. Snappy can decompress at ~ 500MB/s on a single core.

This patch changes the default compression codec for Parquet output from gzip to snappy, and also introduces a ParquetOptions class to be more consistent with other data sources (e.g. CSV, JSON).

How was this patch tested?

Should be covered by existing unit tests.

@@ -136,14 +125,7 @@ private[sql] class DefaultSource
sqlContext.conf.writeLegacyParquetFormat.toString)

// Sets compression scheme
conf.set(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old nested map, getOrElse was super confusing...

@liancheng / @HyukjinKwon let's avoid doing things like this in the future

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I will.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55333 has finished for PR 12256 at commit 70ea5f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ParquetOptions(

@rxin
Copy link
Contributor Author

rxin commented Apr 8, 2016

cc @nongli

@nongli
Copy link
Contributor

nongli commented Apr 8, 2016

LGTM

@rxin
Copy link
Contributor Author

rxin commented Apr 9, 2016

Merging in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants